A/B testing

In this notebook, we 're going to look at data from Cookie Cats which is a hugely successful smartphone puzzle game developed by Tactile Entertainment.

In this game, there is a gate with buying options that appear after several stages have been completed.

This notebook is intended to help the organization make a decision about whether to insert the Gate after stage 30 or 40 based on data obtained from an experiment.

The client was split into two in the experiment, a client with the Gate in stage 30 and another client in stage 40.

We can get to a conclusion just by averaging the retention of players base on stage number which the gate was on and see in which stage number that the Gate performed the best, but we need to see if there is a significance in the result.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.express as px
In [3]:
df=pd.read_csv('./dataset/cookie_cats.csv')
In [4]:
px.histogram(df,x='sum_gamerounds',log_y=True,marginal="box",color='version')
In [5]:
px.histogram(df,x='sum_gamerounds',log_y=False,marginal="box" ,range_x=list(df['sum_gamerounds'].quantile([0.025,0.950])),color='version')
In [6]:
px.parallel_categories(df)
In [7]:
gate_30=df[df['version']=="gate_30"]['retention_1'].astype(int).reset_index(drop=True)
gate_40=df[df['version']=="gate_40"]['retention_1'].astype(int).reset_index(drop=True)
In [8]:
gate_30.mean()-gate_40.mean()
Out[8]:
0.005905169787341458
In [9]:
ss=5000
perm=np.empty(ss)
boot=np.empty(ss)
for i in range(ss):
  concat=np.concatenate((gate_30,gate_40))
  perm_m=np.random.permutation(concat)
  perm_30=perm_m[:len(gate_30)]
  perm_40=perm_m[len(gate_30):]
  perm[i]=np.mean(perm_30)-np.mean(perm_40)

  boot[i]=np.mean(np.random.choice(gate_30,size=len(gate_30)))-np.mean(np.random.choice(gate_40,size=len(gate_40)))
pvalue=np.sum(perm>=gate_30.mean()-gate_40.mean())/len(perm)
print(pvalue)
0.0382
In [10]:
retantion_1_seg=pd.concat([pd.DataFrame(perm,columns=['permutation_avg']),pd.DataFrame(boot,columns=['Bootstram_avg'])],axis=1)
In [11]:
retantion_1_seg=retantion_1_seg.melt(value_vars=['permutation_avg',"Bootstram_avg"],var_name='type',value_name='avg')
In [12]:
fig=px.histogram(retantion_1_seg,x="avg",color='type',marginal='box')
fig.update_layout(shapes=[
    dict(
      type= 'line',
      yref= 'paper', y0= 0, y1= 1,
      xref= 'x', x0= gate_30.mean()-gate_40.mean(), x1= gate_30.mean()-gate_40.mean()
    )
])
fig.show()
In [13]:
gate_30=df[df['version']=="gate_30"]['retention_7'].astype(int).reset_index(drop=True)
gate_40=df[df['version']=="gate_40"]['retention_7'].astype(int).reset_index(drop=True)
In [14]:
gate_30.mean()-gate_40.mean()
Out[14]:
0.008201298315205913
In [15]:
ss=5000
perm=np.empty(ss)
boot=np.empty(ss)
for i in range(ss):
  concat=np.concatenate((gate_30,gate_40))
  perm_m=np.random.permutation(concat)
  perm_30=perm_m[:len(gate_30)]
  perm_40=perm_m[len(gate_30):]
  perm[i]=np.mean(perm_30)-np.mean(perm_40)

  boot[i]=np.mean(np.random.choice(gate_30,size=len(gate_30)))-np.mean(np.random.choice(gate_40,size=len(gate_40)))
pvalue=np.sum(perm>=gate_30.mean()-gate_40.mean())/len(perm)
print(pvalue)
0.0006
In [16]:
retantion_1_seg=pd.concat([pd.DataFrame(perm,columns=['permutation_avg']),pd.DataFrame(boot,columns=['Bootstram_avg'])],axis=1)
retantion_1_seg=retantion_1_seg.melt(value_vars=['permutation_avg',"Bootstram_avg"],var_name='type',value_name='avg')
In [17]:
fig=px.histogram(retantion_1_seg,x="avg",color='type',marginal='box')
fig.update_layout(shapes=[
    dict(
      type= 'line',
      yref= 'paper', y0= 0, y1= 1,
      xref= 'x', x0= gate_30.mean()-gate_40.mean(), x1= gate_30.mean()-gate_40.mean()
    )
])
fig.show()
In [ ]: